ABSTRACT
Adolescent idiopathic scoliosis (AIS) is the most common nondegenerative spinal abnormality. Research indicates a strong correlation between genetics and AIS, with heritability estimates of 87.5%. However, the rarity of shared causative genes among patients, and the difficulty of replication between studies suggest that AIS is a highly complex polygenic disease. In this study, we utilized whole-genome sequencing (WGS) to comprehensively explore the genetic landscape of 119 AIS patients from 103 families. We implemented an automated WGS analysis pipeline powered by RareVisionTM consisting of automated algorithms and manual curation. We identified clinically relevant candidate variants in 20/119 (16.8%) patients and potentially relevant strong or moderate candidate variants in another 73/119 (61.3%) patients. Candidate variants included coding and noncoding point mutations, along with structural variants and large indels, showing the utility of WGS. Candidate genes included AIS-associated genes (e.g. CHD7, COL11A1/2, FBN1/2, HSPG2, KIF7), as well as genes associated with other musculoskeletal and developmental syndromes where scoliosis is a known symptom (e.g. RYR1, GJB2, MYH2, MYH7). Association analysis showed 4 known AIS single nucleotide polymorphisms (rs12946942, rs10756785, rs3904778, rs7633294) also correlated with AIS in our cohort. Finally, through gene set enrichment analysis, we were able to identify 3 gene clusters involved in skeletal muscle contraction, extracellular matrix composition, and gene expression regulation. In summary, through scalable WGS-based familial testing we were able to (1) find clinically relevant genetic variations in the majority of our patients and (2) create a large dataset that allowed us to identify biological pathways relevant to AIS etiopathogenesis.
INTRODUCTION
Adolescent idiopathic scoliosis (AIS) is the most common nondegenerative spinal abnormality with a prevalence of 1−4%.1–3 It is diagnosed when a three-dimensional spinal deformity measured by the Cobb angle exceeding 10° of magnitude in the coronal plane.4 Severe progressive curvatures can lead to serious complications such as cardiac or respiratory compromise, back pain, and other degenerative disease, as well as psychological issues related to cosmetic deformity, which can have a long-term impact on the patient’s quality of life.5 For mild scoliosis, brace therapy is often combined with physiotherapy, and for severe individuals, surgical treatments are required.
Despite growing research for decades, the etiopathogenesis of AIS remains unclear. Several hypotheses of AIS etiology involving genetics, central nervous system, biomechanics, metabolic pathways, spinal cord growth, endocrine factors, sex hormones, bone metabolism, and epigenetics have been proposed.6,7 Epidemiological studies have identified high heritability and marked sexual dimorphism in AIS, with significant female predominance.1 Multiple literatures support the genetic foundation of AIS, with sibling recurrence risks reported to be 17.7%, heritability estimates of approximately 87.5%,8 and odds ratio (OR) for developing AIS being 1.5-fold higher for the participants whose mother had scoliosis.9 Nevertheless, the rarity of causative genes that have been clearly identified as ‘AIS genes’ and the lack of replication of certain variants between genetic analysis in different study cohorts suggest that AIS is a highly complex polygenic disease that results from the interaction of multiple gene loci and the environment.10–12
With the advancement of next-generation sequencing, molecular genetics diagnostic methods using targeted sequencing or whole exome sequencing (WES) have been widely utilized. Most AIS genetic researches have mostly utilized targeted sequencing, WES, and single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS).13–28 However, mutations in regulatory or intronic regions, difficult-to-sequence repetitive sequences, and the presence of novel genes are challenging to uncover with targeted sequencing or WES, and the complex genomic changes including structural variations (SVs) are difficult to identify with SNP-based GWAS. Whole genome sequencing (WGS), covering the entire genome allows for a more comprehensive exploration of the extended genetic landscape. The clinical utilization of WGS has been limited due to high cost and excessive data, however, a fall in the costs of high-throughput sequencing and advances in bioinformatic analyses have increased the potential for WGS to be used in clinical practice.29
In this study, we comprehensively explored the genetic landscape in 103 probands with AIS utilizing WGS. For WGS analysis, we implemented an automated WGS analysis pipeline powered by RareVisionTM consisting of automated algorithms and manual curation by specialists of genomic medicine. The main goal of this study was to find new candidate pathways and functional categories involved in scoliosis by analyzing a large cohort of AIS patients.
MATERIAL AND METHODS
Study Subjects
This is a retrospective analysis of patients with AIS at the Research for Precision Medicine study at Shriners Hospitals for Children – The Genome Institute. The study was approved by the Institutional Review Board of Shriners Hospitals for Children (IRB approval number: 9-21-2022) and adhered to the tenets of the Declaration of Helsinki. All patients underwent an orthopedic examination to confirm the idiopathic nature of the scoliosis. Patients with congenital scoliosis and possible conditions that cause neuromuscular scoliosis including cerebral palsy, Duchenne muscular dystrophy, myelomeningocele, spinal muscular atrophy, or Friedrich ataxia were excluded from the study. The radiographic diagnostic of scoliosis relied on the presence of a lateral curve to the spine greater than 10° with vertebral rotation.30 The demographic data and clinical phenotypes were retrieved from the electronic medical records. A pedigree investigation was conducted for all patients, and the principle of performing genetic testing for all available parents was followed. Genome sequencing was performed as singleton, duo, and trio sequencing depending on availability of biological parental samples. All individuals provided informed consent prior to the genetic analyses.
Whole-Genome Sequencing and Bioinformatics Analysis
To obtain genomic DNA, peripheral blood samples were collected from probands with or without their parents. Shriners Genomics Institute performed the entire genome sequencing process, and analysis and interpretation were performed using the RareVision™ system (Inocras Inc., San Diego, CA, USA). Genomic DNA was extracted from blood samples using the Allprep DNA/RNA kits (Qiagen, Venlo, Netherlands). DNA libraries were prepared using TruSeq DNA PCR-Free Library Prep Kits (Illumina) and sequenced on the Illumina NovaSeq6000 platform (Illumina) with an average depth of coverage of 30×. The obtained genome sequences were aligned to the human reference genome (GRCh38) using the BWA-MEM algorithm. PCR duplicates were removed using SAMBLASTER. The initial mutation calling for base substitutions and short indels was performed using HaplotypeCaller2 and Strelka2, respectively. Structural variations were identified using Manta. Variants were filtered and annotated. Additionally, to focus on rare variants, we selected variants with minor allele frequency (MAF) <0.01 in 1000 Genomes Project (1000GP),31 The Genome Aggregation Database (gnomAD),32 and The Exome Aggregation Consortium (ExAC).33 Pathogenicity was defined based on the clinical information and genetic variant information of the test subjects, following the guidelines published by the American College of Medical Genetics and Genomics (ACMG),34 ClinVar annotations and functional prediction scores: variants reported to be either (1) pathogenic, (2) likely pathogenic, or (3) uncertain significance. The pathogenicity prediction was further enhanced by using in-house-developed software that automatically integrates updated databases for more accurate analysis.
Candidate Variant Prioritization
To prioritize candidate variants, we considered (1) previously reported associations between the gene and AIS, (2) predicted effect of the mutation, and (3) familial segregation. For (1), we employed a filter using Human Phenotype Ontology (HPO) term “Scoliosis”, and conducted a gene-based review in OMIM and PubMed to sort the candidate genes into the following categories: Known AIS genes (K-AIS) includes 12 genes with strong evidence for AIS association in previous studies (ADGRG6/GPR126,35 AKAP2,36 BNC2,37 CHD7,38,39 COL11A1,40 COL11A2,41 FAT3,42 FBN1,43 FBN2,43 HSPG2,44 KIF7,45 LBX1,46 and POC547). Autosomal dominant/recessive syndromic AIS genes (S-AD-AIS/S-AR-AIS) includes genes known to be associated with an OMIM-listed genetic syndrome where scoliosis is a reported symptom or common comorbidity. Genes with possible association to AIS (Assc-AIS) includes genes with previous evidence for association with AIS, other skeletal abnormalities, or other AIS-associated genes. All remaining genes were considered “NKA” – no known association.
For (2) predicted effect of the mutation, we grouped structural, frameshift, start loss, stop gain (nonsense), stop loss, and splice-site variants as protein truncating variants (PTVs). Missense variants that were reported as pathogenic/likely pathogenic in ClinVar, and/or predicted to be pathogenic/likely pathogenic by the ACMG guidelines were grouped as P/LP. Variants that were predicted to be benign/likely benign based on the ACMG guidelines, variants that were predicted to be benign/tolerated/likely benign by the majority of available predictive tools and databases, and variants that were marked as benign/likely benign in ClinVar were grouped as B/LB. The remaining candidate variants were considered variants of uncertain significance, or “VUS”.
For (3) familial segregation, variants categorized based on family type and mode of inheritance. Our cohort included trios, duos, and singletons. Note that the phenotype information was unavailable for some participating parents. These cases were evaluated based on the number of parents with known phenotype. Following modes of inheritance were considered: De novo variants were absent in both unaffected parents, and heterozygous in the proband. Inherited homozygous variants were heterozygous in both unaffected parents, and homozygous in the proband. Inherited heterozygous variants were absent in the unaffected parent, and heterozygous in the affected parent and the proband. Compound heterozygous variants were defined as a two different inherited heterozygous or de novo variants present in the same gene in trans. For unknown parental genotype and phenotypes, all possible options were considered, and the candidate variants were evaluated for each combination to select the strongest option. Variants that did not fit the criteria for de novo, heterozygous, homozygous, or compound heterozygous based on available parental genotype and phenotypes were considered non-segregating. Based on these criteria, the candidate genes were sorted into categories as described in the Table 1 as follows: 1) solved case, 2) nearly solved case, requires missing parent genotype, 3) case with strong candidate variant, 4) case with moderate candidate variant.
Principal Component Analysis
Principal component analysis (PCA) was carried out in PLINK version 1.90b6.1148,49 using Phase 3 1000GP data.31 VCF files were converted into PLINK format. Autosomal variants were filtered for Hardy-Weinberg equilibrium (p < 0.001), MAF >5%, and maximum missing genotype rate of 25%. Proband and 1000GP cohort data were then merged and pruned to remove variants with MAF <10%, missing genotype rate greater than 5%, and pruned for linkage disequilibrium (LD) using PLINK –indep-pairwise 50 5 0.15. Triallelic and palindromic variants were also removed. PCA was run in PLINK using the –pca flag and the first two principal components were plotted in R.
Gene Ontology Analysis
Genes that include pathogenic rare variants and AIS-related genes identified in the sequencing data were selected as candidate genes. Gene set overrepresentation analyses were performed on MAF < 0.05 gene lists with duplicate genes removed from the combined list from each family. We performed a molecular pathway analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG)50 to determine how candidate genes are networked and enriched within the pathways. Gene Ontology (GO)51,52 terms derived from cellular component and molecular function were also annotated to figure out protein-protein interactions. GO enrichment accessible at https://geneontology.org/ using GO enrichment analysis by PANTHER. The significance of an enrichment result was set at false discovery rate (FDR) <0.05 for all functional annotations. The same input gene lists as used for GO enrichment analysis were used in Enrichr, a gene enrichment software developed by the Ma’ayan laboratory53 accessible at https://maayanlab.cloud/enrichr-kg. We report results from the 2021 GO term Biological Process and 2021 KEGG Human pathways.
SNP Association Analysis
The population of each proband was inferred by k-means clustering (KNN). Briefly, KNN was run on the first 3 principal components from the PCA output, using 1000GP data as the training set, and assuming k=5 for the 5 superpopulations in 1000GP (African or AFR, Admixed American or AMR, East Asian or EAS, European or EUR and Southeast Asian or SAS). For both probands and 1000GP samples, only the individuals with AMR or EUR backgrounds were included in the association analysis due to the rarity of other population groups in our cohort. An allelic test of association was performed for 15 SNPs previously associated with AIS7 using PLINK version 1.90b6.
RESULTS
Baseline Characteristics
The cohort comprised 204 samples from 103 families, consisting of 103 AIS probands and 101 relatives (16 singletons, 65 mother-only duos, 8 father-only duos, and 14 trios). Among the 103 probands, 81 patients (78.6%) were female, whereas 22 (21.4%) were male and 87 patients (84.5%) were idiopathic, whereas 16 (15.5%) were familial AIS with affected mother. Therefore, the analysis of candidate variants included 119 AIS patients (103 probands and 16 affected mothers) from 103 families. PCA with the 1000GP data shows that most of the participants including AIS probands and their parents have European and/or Admixed American ancestry (Figure 1).
Principal component analysis (PCA) of the AIS cohort (A: probands, B: parents) combined with 1000GP. AIS cohort from Shriners Hospital are represented in pink. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian; Shrn, Shriners.
Candidate Variants
The average for mean genomic coverage was 40.7x and the average duplication rate was 7.9%. We identified clinically relevant candidate variants in 20/119 (16.8%) patients from 18/103 (17.5%) families, and potentially relevant strong or moderate candidate variants in another 73/119 (61.3%) patients from 61/103 (59.2%) families. WGS identified 137 genetic variants likely or highly likely to account for disease presentation. Figure 2A provides a summary of variant information identified across all probands. Missense variants constituted the majority of filtered variants (60.1%), followed by frameshift (16.7%), stop gain (8.0%), structural variations (7.2%), intronic variants (4.3%), and splicing variants (3.6%). A total of 27 variants were identified as ‘pathogenic’ or ‘likely pathogenic’ per ACMG guidelines. These genes include ABCC9, ABL2, ACSF3, ALMS1, ALPK3, ANO5, ARID1B, ATP1A3, ATP7B, CUL7, CYP7B1, FBN1, GALT, GCDH, GNAI1, MFN2, MGAT5, NRCAM, NSD2, OTUD6B, PIGV, RECQL3, RNU4ATAC, SERAC1, SMARCA4, and TNPO3.
(A) Pie-chart representing the repartition of category types of the variants found in the AIS cohort. (B) Enrichr visualization of biological process Gene Ontology (GO) terms. The most significant GO terms are represented in pink, Kyoto Encyclopedia of Genes and Genomes (KEGG) terms in grey, and genes are represented in green. For easier visualization, the nodes have been pooled into two categories: extracellular matrix organization and muscle contraction.
A mutation burden analysis across all known AIS-associated genes was conducted to assess for enrichment of rare variants in the affected cohort compared with unaffected cohort. The number of rare (MAF<0.1%), nonsynonymous variants in these population is presented in Supplementary Table 1 and rare nonsynonymous variants were identified in 10 known AIS-associated genes (BNC2, KIF7, COL11A1/2, FBN1/2, HSPG2, POC5, ADGRG6, and FAT3). BNC2 and KIF7 only had variants in affected individuals and variants in BNC2, the gene that carries the most significant risk polymorphisms, were all benign/likely benign.
The list of genetic variants that passed the filtering criteria of our analysis is presented in Table 2 for cases with solved or strong candidate variants, including categories 1−3, and Table 3 for cases with moderate candidate variants, category 4, respectively. In total, 137 candidate genes were identified, 116 of which were unique, 13 (ATP7B, COL6A3, CUL7, DPP6, FAT4, FBN2, LAMA2, MYH7, NSD2, SBF1, SETD5, TRIO, and TRPS1) were shared by two families, and 4 (GJB2, HSPG2, MYH2, and RYR1) were shared by three families. Candidate variants included coding and noncoding point mutations, along with structural variants, showing the utility of WGS. Candidate genes included previously AIS-associated genes (e.g. CHD7, COL11A1/2, FBN1/2, HSPG2, KIF7), as well as genes associated with other musculoskeletal and developmental syndromes where scoliosis is a known phenotype (e.g. RYR1, GJB2, MYH2, MYH7).
Gene Set Overrepresentation Analyses
To understand the connections between the candidate genes, we created an interaction map using the STRING database.54 We saw that 87 out of 137 candidate genes had at least one interactor (Supplementary Figure 1A). Furthermore, we identified 3 interconnected clusters that included the majority of the candidate genes (Supplementary Figure 1B). Cluster 1 included DNA-interacting proteins, including gene expression regulators like ARID1B, CHD3, CREBBP, and KMT2C. Cluster 2 included proteins involved in muscle contraction like ACTN2, MYH2, MYH7, and SCN1A. Cluster 3 included extracellular matrix (ECM) proteins like COL11A1, FBN1, HSPG2, and LAMA2.
To understand the biological functions of identified genes which were overrepresented with damaging variants within the AIS families, we performed an over-representation analysis based on the Mammalian Phenotype Ontology database of our resulting filtered gene lists. We entered the MAF <0.05 gene lists from individual patients into GO enrichment analysis to identify overrepresented gene ontologies: Cellular Component terms in Table 3 and Molecular Function terms in Table 4. The most overrepresented GO Cellular Component categories were “Z disc” (p = 9.21 × 10-5, 11.21 fold enrichment), “Endoplasmic reticulum lumen” (p = 5.17 × 10-3, 5.22 fold enrichment), “laminin-11 complex” (p = 6.87 × 10-3, >100 fold enrichment), and “protein complex involved in cell-matrix adhesion” (p = 7.67 × 10-3, 32.14 fold enrichment). The most overrepresented GO Molecular Function categories were “ATP-dependent activity” (p = 3.00 × 10-3, 4.34 fold enrichment), “extracellular matrix constituent conferring elasticity” (p = 6.36 × 10-3, 54.64 fold enrichment), “ATP binding” (p = 6.46 × 10-3, 2.68 fold enrichment), and “integrin binding” (p = 9.24 × 10-3, 7.97 fold enrichment). Figure 2B displays a visual representation of Enrichr and the first ten clusters in terms of p-value were linked to binding activities. The top enriched GO Biological Process term was “extracellular matrix organization” (p = 1.12 × 10-8) and “muscle contraction” (p = 2.28 × 10-8). Furthermore, the top enriched KEGG term was “ECM-receptor interaction” (p = 3.02 × 10-4).
Allelic Test of Association for SNPs
We attempted to compare SNPs between our patient cohort and a general population cohort of the similar ethnic background to determine their association with AIS disease. Pérez-Machado et al.7 proposed SNPs with the most statistically significant associations with AIS using 10 previous studies and 1387 published associations reported up to 2019. We performed allelic tests of association in our patient cohort for the 15 SNPs proposed by Pérez-Machado et al. Table 5 lists the genes and SNPs of these results and showed a significant association of genetic variants rs12946942, located between SOX9 and KCNJ2, with AIS. A significant association of loci (rs10756785 and rs3904778) of BNC2 gene and loci (rs7633294) of MAGI1 with AIS were also found.
Representative Cases
The genetic variant information of representative cases in which we were able to identify the causative gene of AIS through WGS are presented. Patient 14-0356 was a trio with unaffected parents and was identified with a ∼3.0 Mbp deletion encompassing four low copy repeats (LCR) A to D in the chromosome 22q11.2 region (Figure 3). Comparison of coverage of the 22q11 region in this family showed that this variant was a de novo deletion occurred only in the proband. 22q11.2 proximal deletions are associated with DiGeorge syndrome (MIM 188400), which presents with several physiological and anatomical phenotypes, including scoliosis.55 The TBX1 gene is located at region 22q11.2, and an haploinsufficiency of TBX1 has been identified as the driver of for most of the physical malformations in patients with DiGeorge syndrome.56
(A) Schematic overview of the chromosome 22q11.2 region from the UCSC Genome Browser. The region of the 22q11.2 chromosomal deletion in this patient spans approximately 3 Mb and contains four low copy repeats (LCR A-LCR D) distributed across this region. (B) Coverage of chromosome 22q11.2 region in this trio family (proband in purple, mother in yellow, and father in blue). Y-axis shows read depth and X-axis shows position in Kbs. De novo deletion region in proband is shown with red lines.
Patient 14-0105 was also a trio with unaffected parents and the genetic results can be seen in Figure 4. A frameshift variant (p.Glu1344ArgfsTer91) in the NSD2 gene on chromosome 4 (chr4:1978831:G>GC) was identified with WGS. This variant was not found in the patient’s parents, indicating that it was a de novo event. Haploinsufficiency due to de novo heterozygous frameshift and missense variants in NSD2 has been reported to cause a diverse spectrum of skeletal abnormalities, including scoliosis.57
(A) Schematic figure from the UCSC Genome Browser. Note the mutation region in the NSD2 gene on chromosome 4 in this patient. (B) Integrative Genomics Viewer image of trios (proband-mother-father). Note the de novo event, chr4:1,978,831:G>GC, results in a frameshift variant in NSD2 (p.Glu1344ArgfsTer91).
DISCUSSION
In the present study, we investigated the genetic landscape in 103 AIS probands utilizing WGS. To the best of our knowledge, this is the largest study to explore the comprehensive genomic landscape of an AIS cohort using WGS.
The lack of overlap in specific genes between different families suggests that AIS is a highly polygenic disorder involving variants in multiple genes, potentially suggesting that a combination of genetic/epigenetic and environmental factors are involved in the expression of different phenotypes. GO enrichment analysis revealed that the genes identified in this AIS cohort are mainly involved in “extracellular matrix organization” and “muscle contraction”. Allelic tests of association for SNPs showed that SNP risk alleles located between SOX9 and KCNJ2, BNC2, and MACl1 genes were significantly associated with AIS. A majority of the candidate genes in our list were associated with other complex syndromes with severe phenotypes, including but not limited to scoliosis. This could indicate that patients diagnosed with AIS may be presenting with milder cases of more severe syndromes.
Genes related to ECM organization were the first enriched category in our dataset. The ECM is a complex and dynamic component of all tissues that is made up of components including fibrillar proteins, glycosaminoglycans, proteoglycans, minerals, and related proteins.58 HSPG2, FBN1, and FBN2, which were identified as candidate ECM genes in our patient cohort, have also been reported to be associated with AIS in specific cohorts,43,44 and several collagen genes (COL6A3, COL11A2, COL5A1, COL11A1) were also identified in this cohort. Previous researchers have suggested that the main pathogenesis of AIS is asymmetrical bone growth,59 and histological studies have shown decreased chondrogenesis, disorganized columnation, and premature cessation of growth in the cartilaginous growth plate of the vertebral body.60 The researchers proposed that growth plate chondrocytes indirectly regulate cellular activities, such as shape and volume preferential changes, through alterations in ECM synthesis and degradation.61 These previous studies in individuals with AIS, coupled with our genetic findings, suggest that ECM organization genes may play an important role in the development of the abnormal spinal curvature.
Genes related to muscle contraction were another enriched category in our dataset. MYH2, DES, RYR1, ACTN2, MYH7, and TPM3, which are involved in muscle contraction, were identified in our AIS cohort. There have been various studies on the asymmetry of the paraspinalis muscles in patients with AIS and recent literature has reported dominance of electromyography activity on the convex side of the scoliotic curve.62–64 The genetic findings from our group and others are supportive of a functional role for muscle contraction in AIS etiology. Although the pathogenesis of AIS is still not fully understood, we speculate that the asymmetry of the activity of structures involved in paraspinal muscle contraction as well as the ECM comprising the growth plate of the vertebral body are intricately connected within musculoskeletal tissues and are involved in the biomechanical system, which is one of the important etiologies of AIS.
The majority of previous research on the genetic basis of AIS were genome-wide association studies, which identified a number of SNPs associated with AIS. Although we had a small cohort, we were able to leverage the ethnic background of our participants to replicate the association of 4 known AIS SNPs. The rs12946942 locus at 17q24.3 is correlated with curve severity.65 It is located between SOX9, a pivotal transcription factor involved in skeletal development and ECM remodeling,66,67 and KCNJ2, which encodes a potassium channel involved in muscle contraction, and associated with skeletal malformations including progressive scoliosis.68,69 The rs7633294 at 3p14.1 is located near MAGI1, which encodes a scaffolding molecule important for the stabilization of cadherin-mediated cell interactions.69,70,70 The loci rs10756785 and rs3904778 are located within BNC2, which encodes a highly conserved zinc-finger protein involved in ECM regulation.37,71–73 Additionally, we detected 3 rare nonsynonymous variants with predicted benign effect in affected individuals, while no rare nonsynonymous variants were detected in unaffected individuals. These findings further support that the regulation of ECM structure and muscle contraction through transcriptional mechanisms play a significant role in AIS. Given the delicacy of these systems, identifying functional regulatory variants will be important to understand the precise mechanisms that lead to disordered phenotypes. Continued efforts in utilizing WGS combined with regulatory variant annotation have been successful in identifying candidate causative variants in other complex disorders like cardiomyopathy and autism spectrum disorder,74,75 and similar approaches should be employed to better understand the etiology of AIS.
In our study, 6% of all variants were identified as intronic variants and 4% as SVs as the causative variants of AIS. The identification of intronic variants and SVs require sequencing of noncoding regions of the genome. The panel-based sequencing and WES are limited in their ability to detect such variants, particularly those with breakpoints in intronic or intergenic regions, whereas WGS can overcome this limitation by covering all exonic and intronic regions of the genome.76,77 We believe that genetic approaches using WGS in these population will help us to understand the different phenotypes of patients in clinical practice and to characterize the genetic components of AIS pathophysiology. It is noteworthy that a variety of syndromic genes were identified in our patient cohort, suggesting that AIS may present as a milder presentation of more severe forms of syndromes. While many studies have reported on the important role of genetics in the initiation of AIS, it is also important to note that environmental factors and epigenetic influences are also deeply involved in the progression of AIS.7,78
This study had some limitations. The study population was predominantly of European and American ancestry. To capture the complete genetic landscape of AIS, WGS studies of larger cohorts including individuals with diverse ancestry will be required. Secondly, the majority of our identified variants are heterozygous, and without functional testing, it is unclear if these variants have a true dominant-negative effect on the resulting protein. Additional functional studies of specific variants will be warranted to demonstrate causality. Thirdly, analyses of clinical phenotypes of AIS, such as Cobb angle and location of the curve, are lacking. Further studies to determine the association between clinical manifestations and genotype will provide a deeper understanding of the role of genetic mechanisms underlying AIS. However, despite these limitations, the present study has the strength of being the first to investigate WGS results of the largest AIS cohort.
In conclusion, this study investigated the genetic landscape in the largest AIS cohort utilizing WGS. We observed an overrepresentation of variants in ECM organization and muscle contraction categories, with few specific genes shared across families. Overall, this study revealed that the genetic etiology of AIS is highly polygenic and that WGS has a greater potential to detect a wider spectrum of genetic variants, such as SVs and intronic variants. Further research with clinical phenotypes and functional studies of genetic variants are required to draw definitive conclusions regarding the genetic mechanisms in AIS pathogenesis.
Data and code availability
The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to the ethical and privacy nature of the data. The RareVision™ algorithm and associated in-house scripts are proprietary to Inocras Inc.
Author contributions
Concept and design: I.O.T., E.K.L., S.L., K.S.; acquisition, analysis, and interpretation of data: I.O.T., E.K.L., A.G.; drafting of the manuscript: I.O.T., E.K.L.; critical revision of the manuscript for important intellectual content: S.L., K.S.; statistical analysis: I.O.T., E.K.L.; funding acquisition: E.K.L., S.L.; administrative, technical, and material support: Y.L., J-Y.K., W.L., A.G.; supervision: S.L., K.S.
Declaration of interests
The authors declare no competing interests.
Acknowledgments
This study was supported by a research grant of the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (Information and Communication Technology, NRF-2021R1F1A1045417). The sponsor or funding organization had no role in the design or conduct of this research.
Footnotes
Authors
Islam Oguz Tuncay oguztuncay{at}inocras.com
Eun Kyoung Lee eklee{at}inocras.com
Anxhela Gustafson agustafson{at}shrinenet.org
Yoonsuh Lee yslee{at}inocras.com
Dawoon Jung dwjung{at}inocras.com
June-Young Koh jy.koh{at}inocras.com
Wonchul Lee wclee47{at}inocras.com
Sangmoon Lee slee{at}inocras.com
Kamran Shazand kshazand{at}shrinenet.org